Analysis of a large-scale weighted network of 
one-to-one human communication 

Jukka-Pekka Onnela^'^ 
Jari Saramaki^ 
Jorkki Hyvonen^ 
Gabor Szabo^'^ 
M. ArgoUo de Menezes^ 
Kimmo Kaski^ 
Albert-Laszlo Barabasi'^'^ 
Janos Kertesz^'^ 



^ Laboratory of Computational Engineering, Helsinki University of Technology, 
Finland 

^Clarendon Laboratory, Physics Department, Oxford University, Oxford, U.K 
•^Department of Physics and Center for Complex Networks Research, University 
of Notre Dame, IN, USA 

■^Center for Cancer Systems Biology, Dana Farber Cancer Institute, Harvard 
University, Boston, MA, USA 

^Department of Theoretical Physics, Budapest University of Technology and 
Economics, Budapest, Hungary 



Abstract. We construct a connected network of 3.9 million nodes from mobile 
phone call records, which can be regarded as a proxy for the underlying human 
communication network at the societal level. We assign two weights on each edge 
to reflect the strength of social interaction, which are the aggregate call duration 
and the cumulative number of calls placed between the individuals over a period of 
18 weeks. We present a detailed analysis of this weighted network by examining its 
degree, strength, and weight distributions, as well as its topological assortativity 
and weighted assortativity, clustering and weighted clustering, together with 
correlations between these quantities. We give an account of motif intensity and 
coherence distributions and compare them to a randomized reference system. We 
also use the concept of link overlap to measure the number of common neighbors 
any two adjacent nodes have, which serves as a useful local measure for identifying 
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the interconnectedness of communities. We report a positive correlation between 
the overlap and weight of a link, thus providing strong quantitative evidence 
for the weak ties hypothesis, a central concept in social network analysis. The 
percolation properties of the network are found to depend on the type and 
order of removed links, and they can help understand how the local structure 
of the network manifests itself at the global level. We hope that our results 
will contribute to modeling weighted large-scale social networks, and believe that 
the systematic approach followed here can be adopted to study other weighted 
networks. 



1. Introduction &; Data 

Social networks have been a subject of intensive study since the 1930's. In this 
framework social life consists of the flow and exchange of norms, values, ideas, and 
other social and cultural resources [1] , and social action of individuals is affected by the 
structure of the underlying network [2] . The structure of social networks is important 
then not only from the perspective of the individual, but also from that of the society 
as a whole. However, uncovering the structure of social networks has been constrained 
by the practical difficulty of mapping out interactions among a large number of 
individuals. Social scientists have ordinarily based their studies on questionnaire 
data, typically reaching the order of ~ 10^ individuals [3]. Although the spectrum 
of social interactions that may be probed in this approach is wide, the strength of 
an interaction is often based on recollection and, consequently, is highly subjective. 
However, in the late the 1990's a change of paradigm took place [HIS]. Physicists 
became interested in large scale social networks, utilizing electronic databases from 
emails [6l [Tj [8] to phone records [9] , offering unprecedented opportunities to uncover 
and explore large-scale social networks [10]. In this scheme the order of « 10^ 
individuals may be handled and, although the range of social interactions is narrower, 
in some cases their strengths may be objectively quantifiable. While both approaches 
have their merits, studying large-scale networks has potential to shed light on how 
individual microscopic interactions translate into macroscopic social systems. In 
addition to this being one of the key questions as posed by social scientists in the 
field, it is also the one to which statistical physics in general, and the science of 
complex networks in particular, can make a contribution. 

In this paper we present a detailed analysis of a network constructed from a 
data set consisting of the mobile phone call records of over seven million individuals 
over a period of 18 weeks (126 days), covering approximately 20% of the population 
of the country. For the purpose of retaining customer anonymity, each subscription 
was identified by a surrogate key, guaranteeing that the privacy of customers was 
respected. We kept only voice calls, filtering out all other services, such as voice mail, 
data calls, text messages, chat, and operator calls. We filtered out calls involving 
other operators, incoming or outgoing, keeping only those transactions in which the 
calling and receiving subscription is governed by the same operator. This filtering was 
needed to eliminate the bias between this operator and other operators as we have 
a full access to the call records of this operator, but only partial access to the call 
records of other operators. We constructed two different networks from the data. In 
the first scheme we connected two users with an undirected link if there had been at 
least one phone call between them, i.e., i called j or j called i, resulting in a non- 
mutual network consisting of iV = 7.2 x 10^ nodes and L — 22.6 x 10^ links. However, 
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many of these calls are one-way, most of which correspond to single events, suggesting 
that they typically reach individuals that the caller might not know personally. To 
eliminate them, in the second scheme we connected two users with an undirected link 
if there had been at least one reciprocated pair of phone calls between them, i.e., i 
called j and j called i, resulting in a mutual network with N — 4.6 x 10® nodes and 
L = 7.0 X 10® links. 

The resulting mobile call graph (MCG) naturally captures only a subset of the 
underlying social network, which consists of all forms of social interactions, including 
face-to-face interactions, email and landline communication etc. However, research on 
media multiplexity suggests that the use of one medium for communication between 
two people implies communication via other means as well [llj . Furthermore, in 
the absence of directory listings, the mobile phone data is skewed towards trusted 
interactions, i.e., people tend to share their mobile numbers only with individuals 
they trust. Therefore, the MCG can be used as a proxy for the underlying social 
network. 

We can quantify the weight of the link (i, j) by the aggregated time i and j spent 
talking to each other as well as by the total number of calls made between i and j 
over the studied period. These weights are denoted by wf^ (total duration of calls) 
and wfj (total number of calls), respectively, where the former is measured in seconds 
(s) and the latter is a dimensionless quantity. 

This paper is devoted to the study of these weighted, large-scale, one-to-one social 
interaction networks, with emphasis on the mutual over the non-mutual network. We 
adopt a " cookbook approach" by carrying out a systematic analysis of basic and more 
advanced network characteristics, and hope that others working on weighted networks 
will benefit from our " recipes" . We study some of the basic network characteristics in 
Section [2] and focus on weighted network characteristics in Section |3l We explore the 
coupling between link weight and the surrounding local network topology in Section 
m We have dedicated Section [5] to the study of percolation properties of the network 
and, finally, discuss our findings in Section [5) 

2. Basic network characteristics 

We start inspecting the network by showing a small sample of it in Fig.[T] The sample 
has been extracted from the mutual network by picking a node (source node) at 
random and including all nodes in the sample that are within a (topological) distance 
of £ = 5 from the source node. This method of sampling is sometimes called snowball 
sampling . The color of links corresponds to the strength of each tic in terms of . 
It appears from this figure that the network consists of small local clusters, and the 
majority of the strong ties (colored in red) seem to be localized within these clusters. 
In some cases nodes connected by a strong link have many common neighbors, but 
there are also strongly connected nodes with few or no common neighbors. 

These two apparently contradictory trends arise as a result of being forced to 
examine a sample of the network as opposed to the entire network. To understand the 
limits of visual inspection, it is important to realize that since the network is a high 
dimensional object, a majority of the nodes will be on the outskirts of the sample. A 
consequence of this is that for most of these nodes we only have partial visibility into 
their neighborhood. Consequently, one can see the full neighborhood for only a small 
minority of nodes in the sample. 
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Figure 1. A small sample of the network with link weights wr^ color coded from 
yellow (weak link) to red (strong link). 
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depth £ 

Figure 2. Number of nodes in the sample Ns{£), obtained by snowball sampling, 
as a function of extraction distance £ for several choices of the source node (solid 
lines) and their average (dashed line). 
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Figure 3. A sample of the network, showing the source (square at the center) 
node from which sampling was started, the bulk nodes(+), and surface nodes (o). 
For surface nodes, which clearly are in the majority, only some of their nearest 
neighbors are visible in the sample, while the rest are outside the sample. 



Let us elaborate on network sampling. We show in Fig. [2] the number of nodes 
in the sample Ns{£), obtained using snowball sampling, as a function of extraction 
distance £ for several choices of the source node (solid lines) and their average (dashed 
line) . Here {£) is defined as the number of nodes within a distance £ from the given 
source node. For a fixed value of x, we call nodes for which £ < x bulk nodes and 
those with £ = x surface nodes of the sample. The number of surface nodes clearly 
outweighs the number of bulk nodes. This is to be expected since the network behaves 
like a high dimensional hypersphere, the volume of which is negligible to its surface 
area. To a good approximation we can write Ns = Ae^^ , where A and B are fitting 
parameters. In general, the number of of surface nodes to the number of bulk nodes 
is [Ns{£) — Ns{£ — 1)]/Ns{£) = 1 — e^^ . Thus, a large majority of nodes are surface 
nodes. 

This is clear from another network sample in Fig. [31 in which bulk nodes and 
surface nodes are drawn with different markers. It is only for bulk nodes to which 
we have full visibility of their neighborhood and, consequently, may make unbiased 
judgments about the structure of their neighborhoods. Since these nodes are clearly 
in the minority, it is clear that visual inspection of network samples has limited utility. 
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Figure 4. Cumulative degree distribution Py^^(k) for the mutual (•) and non- 
mutual (o) networks (left) and for their respective largest connected components 
pLCC^^^ (right). The mutual network is a subgraph of the non- mutual one, and 
84.1% of the nodes in the mutual network belong to a single connected component 
(LCC), for which the average degree (fc) ?3 3.0. 



A basic network characteristic, the degree distribution, is shown in Fig. [H To 
avoid the need of binning, we study the cumulative degree distribution P>(fc), defined 
as P>{k) = p{x) dx, where the degree probabihty density function. We denote 
the distribution for whole mutual and non- mutual networks Py'^^{k), and that of their 

respective largest connected components (LCC) by {k). Note that the mutual 

network is a subgraph of the non-mutual one, and the LCC is a subgraph of the whole 
network. In the case of the mutual network 84% of the nodes belong to the LCC. In 
this case little is left outside the LCC, partly explaining why distributions are almost 
identical for the whole network and the LCC. 

In general, the degree distributions are skewed with a fat tail, indicating that 
while most users communicate with only a few individuals, a small minority talks with 
dozens. The noticeable difference between the degree distributions for the mutual and 
non-mutual network is the fatter tail of the non-mutual network. In particular, the 
non-mutual network has a fatter tail, so that while the most connected node in the 
LCC of the mutual network has fcmax = 144, in the LCC of the non-mutual network 
^max = 34625. Clearly, the latter cannot correspond to a single individual. However, 
it appears plausible that the mutual network is dominated by trusted interactions, 
i.e., people tend to share their mobile numbers only with individuals they trust. We 
also point out that fcmax = 144 in the mutual network is very close to the approximate 
number of fc = 150 put forward by Dunbar as a limit on connectivity resulting from 
the size of neocortex in the cerebral cortex in primates [13j . From now on, unless 
otherwise mentioned, we shall focus exclusively on the LCC of the mutual network. 

The tail of the degree distribution P{k) for the LCC of the mutual network is 
approximated well by a power law of the form P{k) — a{k -\- fco)~^ with fco = 10.9 
and 7 = 8.4. Note that the value of the exponent is significantly higher than the 
value observed for landlines (7 = 2.1 for the in-degree distribution [H]). For such a 
rapidly decaying degree distribution the hubs are few, and therefore many properties 
of traditional scale-free networks, from anomalous diffusion [T3] to error tolerance |16j . 
are absent. 
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Figure 5. Cumulative link weight distributions (left) and cumulative node 
strength distributions (right) in the LCC of the mutual network. Link weights 
and node strengths are measured in terms of the absolute number of ealls made 
during the studied period (o), corresponding to P^(ui) and Py [s), as well as the 
aggregated call duration during the period (•), given by P!^(w) and Py{s). 



As mentioned in the Introduction, link weights and node strengths are measured 
in terms of the absolute number of calls made during the studied period. The 
associated cumulative distributions are (w) and Py {s) for the number of calls, 
and P^{w) and P^{s) for the aggregated call duration as shown in Fig. [5l Both link 
weight distributions are broad so that while the majority of ties correspond to a couple 
of calls and a few minutes of air time, a small fraction of users place numerous calls 
and spend hours chatting with each other. On average an individual made (s^) « 51.1 
calls and spent (s^) ^ 8074s (135 mins) on the phone. Two connected individuals 
spoke on average (w^) « 15.4 times on the phone spending altogether (w^) « 2429s 
(40 mins) talking to one other. These values are summarized in Table [H which also 
lists some higher moments for the distributions. The two weights wf^ and wf^ are 
strongly correlated as expected, and this is evident in Fig. [SI In the mutual network 
Pearson's hnear correlation coefficient between wfj and wfj is 0.70, implying that 
variance in wfj explains some 50% of variance in wij^ . 

The tail of the weight distribution P{w^) for the LCC of the mutual network 
is approximated well by an exponentially truncated power-law of the form P{w) — 
a{w + wo)~^ e^p (—w/wc) with wq ~ 280, f3 = 1.9, and the cut-off parameter 
Wc = 3.4 X 10^. The broad tailed nature of these distributions is rather unexpected, 
given that fat tailed tie strength distributions have been observed mainly in networks 
characterized by global transport processes, such as the number of passengers carried 
by the airline transportation network [17] . the reaction fluxes in metabolic networks 
[18], and packet transfer on the Internet [19]. In all these cases the individual fluxes 
are determined by the global network topology, in which an important property is 
"conservation of mass", i.e., local conservation of passengers, molecules, and data 
packets. Such constraints are not present here and, in addition, social networks are 
expected to be fairly local in nature, meaning that the nature of the link weight 
and strength distributions are non-trivial. This raises the interesting question of the 
extent to which network structure and link weights are correlated in this network and, 
in general, whether their extent of correlation can be used to categorize networks in 
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Figure 6. Scatter plot of call duration weights and number of calls weights 
ji)^. The two weights are clearly correlated in this random sample of 5000 links, 
as well as in the LCC of the mutual network, giving rise to Pearson's linear 
correlation coefficient of 0.70 in the latter. 
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Table 1. Summary of descriptive network statistics. The following terms are 
used: whole network (net), largest connected component (LCC), non-mutual 
network (NM) and mutual network (M). The superscripts A'^ and and D refer 
to number-of-calls and aggregate-call-duration based weights and strengths, 
respectively. 



different classes. We will address the first part of this question in Section 21 

Social networks are expected to be assortative: People with many friends are 
connected to others who also have many friends. This gives rise to degree-degree 
correlations in the network, meaning that the the degrees of two adjacent nodes are 
not independent. These correlations are completely described by the joint probability 
distribution P{k,k'), giving the probability that a node of degree k is connected to 
a node of degree k'. It is more practical, however, to define the average nearest 
neighbours degree of a node Vi as fc„„,i = (l/^i) X]jeA/'(i>i) ^j' where M{vi) denotes the 
neighbourhood of Vi . By averaging this over all nodes in the network of a given degree 
k, one can calculate the average degree of nearest neighbors with degree k denoted 
by {knn\k), which corresponds to k'P{k'\k) [20 . The network is said to exhibit 
assortative mixing if (fc„„|fc) increases and disassortative mixing if it decreases as a 
function of k [2T] . 

We show the average nearest neighbor degree in Fig. [7l We follow Barrat et al. 
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Figure 7. Average neighbor degree {k„n\k), {k^^\k), and {k^^\k) (left) and 
average neighbor strength {s^„\s^) and {s^„\s^) (right) in the LCC of the mutual 
networlc. The three markers in the plot on the left correspond to unweighted (fcnn) 
(black squares), number-of-calls weighted {k^„) (o), and call-duration weighted 
i^nn) (•) averages. The markers on the right correspond to number of calls (o) 
and total call duration (•). 



and use the weighted average nearest neighbor degree to characterize degree-degree 
correlations [22], which are written as fcj^„_j = (l/sf) I]jeA/'(i,.) ""^ij and k^^^^ = 
(l/'sf) ^jej\r{vi) ''^ij^j' corresponding to the two weighting schemes. Averaging these 
over the network gives (fc„„|A:), {k^^^\k) and {k^^\k), which measure the effective 
affinity to connect with neighbors of a given degree while taking the magnitude of 
the interactions into account |22]. The three measures behave very similarly in Fig. [71 
and the network is clearly assortative degree- wise such that (fc„„|A:) ~ fc" applies with 
a « 0.4. 

In addition to degree-degree correlations, which characterize the topology of the 
network, we can study correlations between node strengths, where node strength is 
given by Si = X]je7V(« ) ^u - average nearest neighbour strengths are given by 

Snn,i = EjeA/'(i>.) «f and s^„_, = (1/fci) Ej6A/'(i,.) sf which, when averaged over 

all nodes in the network with strength approximately equal to s, gives the average 
strength of nearest neighbors {Snn\s^) and (s^„|s^). Whereas the degrees of two 
adjacent nodes are strongly correlated, we find that the strengths of two adjacent 
nodes in most cases are not. Fig. [7| shows that the dependence of (s^„|s) ^ s" 
can be divided into two parts, where the independence observed for small crosses 
over at Sx ~ 10^ to a linear relationship. This linear region can be understood by 
studying the the proportion of node strength that is contributed by a single link. It 
turns out that for very strong links with > 10'', which make up 4.4 % of all links, 
the strength of both adjacent nodes is determined almost entirely by the weight of this 
single link such that Si ~ Wij ~ Sj |23| . This explains the linear trend in strength- 
strength correlations. The plot for (s^„|s^) suggests a qualitatively similar picture, 
where the linear trend naturally sets in earlier in terms of the absolute value of . 

The extent of clustering around a node i is quantified by the (unweighted) 
clustering coefficient Ci = 2ti/[ki [ki — 1)], where ti denotes the number of triangles 
around node i [1]. Empirical networks have been found to have fairly high average 
clustering coefficients, which can be seen as manifestation of the presence of three-point 
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Figure 8. Average (topological) clustering coefficient (C|fc) (left) and average 
weighted clustering coefficients {C\s^) and {C\s^) (right) in the LCC of the 
mutual network. The topological clustering coefficient does not depend on 
weights, and is presented as a function of degree k (o). In contrast, the weighted 
clustering coefficient is presented as a function of node strengths in terms of 
number of calls (o) and aggregated call duration (•). 
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Figure 9. Average strength conditional on degree in terms of the number of calls 
{s^\k) (o) and aggregated call duration {s^\k) (•) (left) and average strength 
product SiSj as a function of degree product kikj denoted by [s^ \kikj) and 
(sfsf\k,k,). 



correlations. Typically, one looks at the average clustering coefficient as a function of 
degree (C|fc), known as the clustering spectrum, as shown in Fig.[51 Here (C|fc) ^ k"^ 
as is commonly found in many empirical networks [24] . This seems to indicate that 
clustering spectrum does not discriminate very well between different networks, which 
motivates us to adopt weighted network characteristics in Section [3l 

We have seen above that vertex degree distribution and vertex strength 
distribution are very similar in nature, which can be understood by examining degree- 
strength correlations. Average strength conditional on degree in terms of the number 
of calls {s^\k) and aggregated call duration {s^\k) are shown in Fig. [9] If there were 
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Figure 10. Scaling of weights wf^ (o) and (•) as a function degree product 
kikj (left) and strength product Si-Sj (right). 



no correlations between vertex degree and the weights of the hnks adjacent to the 
vertex, as can be obtained by shuffling the weights of the hnks, we would expect that 
{s\k) ~ with a ~ 1, since (s,) = ki{w), where (w) is the average hnk weight in 
the network. However, now we have (s^|fc) - /c° where « 0.8 and {s'^\k) ~ fc" 
where « 0.9, indicating that vertex strength grows somewhat more slowly than 
vertex degree. This is to say that individuals who talk to a large number of friends, 
on average, have slightly less time per friend than those who spend less time on the 
phone. 

We can study the strength product Si-Sj as a function of degree product kikj, the 
averages of which are denoted by {sf \ kikj) and {sf s^\kikj) , shown on the right in 
Fig. [S In the absence of correlations, we would expect that {siSj\kikj) — {w)'^{kikj) 
giving {sf sf\kikj) ~ (kikj)^ with (3 = 1. However, we now obtain (3^ « 0.4 whereas 
« 0.7, corresponding to sublinear growth. Let us also introduce scaling exponent 
for degree products such that {w(j\kikj) ^ {hkj)'' and {w^\kikj) ^ (kikj)^ and 
for strength products such that (w^|sf sj^) ~ (siSj)*" and (Wj^jsfs^) ~ {siSjY" . 
The plots of these quantities are shown in Fig. (TUl We find that 7'° w —0.2 and 
7^ « —0.1, indicating that the links weights, wheter measured in terms of or 
wfj , are practically independent of the degree product kikj. This shows that links 
weights are not determined by the absolute number of friends (node degrees) of Vi 
and Vj. In contrast, as we will see in Section [H link weights are dependent on the 
relative proportion of common neighbors (link overlap). For the latter exponents we 
have w « 0.5, such that Wij scales as the geometric mean of the strengths of 
the adjacent nodes. 

Putting these structural properties together, we have seen that the network has a 
very steep degree distribution, resulting in few highly connected nodes, and even they 
are not as connected as hubs in scale-free networks are. The two weights, number of 
calls and aggregate call duration, are strongly correlated, and both yield steep strength 
distributions for nodes. This can be understood in light of the only slightly sublinear 
dependence of strength on degree, governed by the exponent a. Topologically the 
network is assortative, but not weight-assortative for a large majority of nodes. The 
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weight of a given link is almost independent of the product of the degrees of adjacent 
nodes as governed by the almost vanishing exponent 7, but depends on the geometric 
mean of the strengths of the adjacent nodes as indicated by the exponent S. 

3. Advanced network characteristics 

Study of purely topological properties of networks, as was done in Section [2l is a useful 
starting point, but incorporating weights in the analysis is important, as it can enhance 
our understanding of the structural properties of the network. This motivates us to 
proceed to weighted network characteristics. Here important concepts are subgraph 
intensity and subgraph coherence that can be used to study the coupling between 
network structure and interaction strengths [25| . The intensity of subgraph g with 
vertices Vg and links £g is given by the geometric mean of its weights as 




Kg) - , (1) 



where \£g\ is the number of links in £g ^E\. Note that the unit of intensity is the 
same as the unit of network weights. To characterize the homogeneity of weights in 
a subgraph, we defined subgraph coherence q{g) as the ratio of the geometric to the 
arithmetic mean of the weights as 

q{g) = l{9)\£g\/ J2 ^^J-- (2) 

Here q{g) e [0, 1] and it is close to unity only if the weights of subgraph g do not differ 
much, i.e. are internally coherent [25j . 

The average intensity of subgraph g at node k is given by ikig) = (1/^fe) J^g^, 
where J^g^ denotes a sum over all topologically equivalent subgraphs containing node 
k. We can average this over all nodes that participate in one instance of the subgraph, 
denoted by {i{g)) = in'{g)\vg\)~^ J^k where n{g) is the number of subgraphs g 

in the network and \vg \ is the number of nodes in subgraph g. Regarding notation, we 
emphasize that ifc(A) denotes the mean intensity of triangles around a particular node 
k, where the mean is taken over all triangles attached to the node, whereas (i(A)|s) 
denotes average taken over all nodes whose strength is approximately s. The behavior 
of average intensity of triangles as a function of node strength, (i(A)^|s^) and 
(i(A)^|s^), and average mean coherence, (g(A)^|s^) and {q{A)^\s^), are shown in 
Fig.im We find that (i(A)^|s^) ~ where « 0.5 and {i{A)^\s'^) ~ {s'^Y" , 

where « 0.7. The behavior of average mean coherence (g(A)^|s^) is markedly 
different from that of the intensity, achieving a maximum at « 10^. 

To consider the effect of weights on the clustering properties of the network, we 
adopt the definition proposed for a weighted clustering coefficient in [25] . leading to 

Ci = ^ _ ^ {wijWikWjkf^^ = Cii,{A), (3) 

where «i(A) denotes the average intensity of triangles at node i. The weights are 
normalized by the maximum weight in the network, Wij — Wij / max(u;) , required 
for reasons of compatibility with the topological clustering coefficient, and the 
contribution of each triangle depends on all of its edge weights [551 [57] . Note that 
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Figure 11. Average mean intensity of triangles {i{A)^ \s^) and 
(j(A)-^ |s-^)(lcft), and average mean coherence {q(A)^ \s^) and {q{A)^\s^) 
(right) as a function of node strengths (o) and (•). 



the weighted clustering coefficient can be written as the product of the unweighted 
clustering coefficient and the average intensity of triangles at a node as shown in Eq. 
[21 Thus triangles in which each edge weight equals max(w) contribute unity to the 
sum, while a triangle having one link with a negligible weight will have a negligible 
contribution to the clustering coefficient. Results are shown in Fig. [8] next to the 
unweighted (topological) clustering coefficient, ft is clear that the behavior for number 
of calls and aggregate duration is very similar. For the duration we assume again that 
a crossover sets in at « 10^. Up-to this point the power law (C|s^) ^ {s^)'' with 
Ri 0.8 gives an acceptable fit. However, the behavior of ((7|s^) cannot really be 
described by a power-law. 

The local structure of unweighted networks can be characterized by the 
appearance of small subgraphs, which have been related to the functionality of several 
networks [281 129] . This is done by studying the number of times a subgraph of interest 
appears in the network, but to draw statistical conclusions about the appearance 
frequency of subgraphs, a reference system needs to be specified, which can be seen 
as analogous to setting up a null hypothesis Hq in the statistics literature. The 
reference system is usually established by rewiring the network while conserving its 
degree distribution in order to remove local structural correlations present in the 
original network. Statistical significance of motifs is usually measured in terms of 
a z-score statistic |29| . Here we have chosen just to provide the number of fully 
connected subgraphs up-to order k = 10 in Table [2l for both the empirical network 
and a corresponding Erdos-Renyi network ^30. . 

The motif framework has been generalized to weighted networks [25,, with the 
motivation of studying the nature of coupling between interactions strengths (link 
weights Wij) and local network topology (an ensemble of subgraphs g). We set 
up a weight permuted reference by simply shuffling the weights in the network, 
which removes weight correlations while leaving the network topology unaltered. 
Any deviation in motif intensities between the empirical and reference system has 
a straightforward interpretation: the local organisation of weights in the empirical 
network is not random. While the z-score may be generalized to weighted networks as 
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Order 


Empirical count 


ER expectation 


1 


6.3 X 10*^ 


6.3 X 10« 


2 


17 X 10^ 


17 X 10^ 


3 


5.6 X 10^ 


2.6 X IQi 


4 


1.4 X 10*^ 


2.5 X 10-" 


5 


2.7 X 10^ 


1.7 X 10-29 


6 


4.5 X 10* 


7.8 X IQ-^^ 


7 


6.8 X 10^ 


2.7 X lO-^'' 


8 


799 


7.0 X 10-121 


9 


61 


1.4 X 10-163 


10 


2 


2.0 X 10-212 



Table 2. Number of cliques of order fc = 1, 2, . . . , 10 in the empirical network 
(Empirical count) and their expectation values in a corresponding Erdos-Renyi 
network (ER, expectation) 1301 . Note that A: = 1 corresponds to the number of 
nodes N = 6282226 and fe = 2 to the number of links L = 16828910, which are 
the same in the empirical and random network. These values of N and L give the 
link formation probability in the ER graph as p = 2L[N{N — 1)] 8.5 X lO"^. 
The expected number E[X] of subgraphs with k nodes and I links is given by 
E[X] = (k\/a)p^, where £ = k{k—l) /2 and a = fc! is the number of graphs that 
are isomorphic to one another, i.e., automorphic, defined as adjacency-preserving 
permutation of the vertices of the graph 1311 . Here the empirical network is a non- 
mutual one formed from the aggregated calls of 12 weeks. Note that subgraphs are 
counted multiple times, such that one subgraph of order k contains k subgraphs 
of order fc — 1 and so on. For example, one subgraph with fc = 5 will also be 
counted as five instances of subgraph of order fc = 4, and 5 X 4 = 20 instances of 
subgraph of order fc = 3. The presence of high-order topological correlations, as 
manifest by the existence of cliques beyond order three (triangles) in the empirical 
network, makes is starkly different from an ER graph, in which high-order cliques 
have astronomically low probability to be present. 



demonstrated in [25^, it has the same shortcoming as the z-score has for unweighted 
networks, namely, that it is based on just one number characterizing the empirical 
network and two numbers characterising the reference distribution. We follow 
an alternative approach here introduced in [32] . which makes use of the entire 
intensity distribution P^{g) for subgraphs g in the empirical network to the intensity 
distribution P^[g) in the corresponding reference ensemble. Now the problem 
becomes one of comparing two distributions with one another for which several 
tools are available, such as the standard Kolmogorov-Smirnov test or the KuUback- 
Leibler divergence [33j- This approach suggests a shift in perspective from regarding 
subgraphs as discrete objects that either exist or not to a continuum of subgraph 
intensities and coherences. 

Results are shown for intensity in Fig. [12] and for coherence in Fig. [131 
Comparing the subgraph intensity distribution shows that the empirical subgraphs 
have considerably higher intensities than their random counterparts. Noting in 
particular the vertical logarithmic scale, we see that some high intensity subgraphs 
can be 10-1000 times more frequent in the empirical than in the reference ensemble. 
Especially for the larger subgraphs, e.g. fc = 6, there are some extremely high 
intensity subgraphs in the empirical network, which are never created randomly in 
the reference ensemble. Similarly, the subgraphs in the empirical network are more 
coherent than their randomized counterparts. The differences become larger as we 
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Intensity / (k=5) Intensity / (k=6) 

Figure 12. Distribution of subgraph intensity based on aggregate call duration 
weights for cliques g of order fc = 3, 4, 5, 6 in the LCC of the empirical mutual 
network (solid blue squares) and in a reference ensemble (open red squares). 
Number of subgraphs of intensity i in the empirical network is given by n^(g, i) 
and their average number in 100 realizations of the ensemble by n^{g,i). A 
realization of the ensemble is obtained by shuffling the weights in the empirical 
network while keeping its topology fixed. Note that both horizontal and vertical 
scales vary between the panels. 



move to more complex subgraphs, the reason being that it is increasingly unUkely 
to create coherent subgraphs with many Unks by chance. Putting the results on 
intensity and coherence together, link weights within cliques are higher and more 
similar in magnitude that expected in a randomized reference system. Consequently, 
there are important correlations between local network structure at the level of cliques, 
or communities, and interactions strengths within them. 

4. Single link properties 

Let us now move from subgraphs to study the properties of links and their immediate 
neighborhood. We quantify the topological overlap of the neighborhood of two 
connected nodes i and j by the relative overlap of their common neighbors, defined as 



(fc, - 1) + (fc, - 1) - n,, ' ' ' 

where Uij is the number of neighbors common to both nodes i and j [23j . It is worth 
pointing out that this is similar, but not identical, to the edge-clustering coefficient as 
introduced by Radicchi et al. as 
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Figure 13. Distribution of subgraph coherence based on aggregate call duration 
weights for cliques of order fc = 3, 4, 5, 6 in the LCC of the empirical mutual 
network n^{g, q) (solid blue squares) and in the reference ensemble n^(g, q) (open 
red squares). Note that both horizontal and vertical scales vary between the 
panels. 



where niin(fci, kj) — 1 is the maximum possible number of triangles around the (i, j) 
edge [34] . Edge-clustering coefficient reflects the probability that a pair of connected 
vertices has a common neighbor, whereas overlap is the fraction of common neighbors 
a pair of connected vertices has. The reason for using Oij as opposed to dj is that 
the denominator of Eq.[S] gives rise to two undesirable features in the context of social 
networks. First, consider a subgraph in which vertices i and j are connected only with 
a single link such that ki = 1 and kj > 1, where vertex i is a leaf of the network. 
We now have Oij = indicating that these two individuals have no common friends, 
which seems a reasonable conclusion, whereas Gij is either not defined or diverges as 
the denominator tends to zero. Second, consider a triangle {i,j,k) such that ki = 2, 
riij = 1 and kj > 2. If kj = 2, then both Oij = 1 and Cij = 1. However, if kj > 2, we 
still have Cij — 1 for all values of kj, whereas Oij — l/{kj — 1). This is to say that 
the overlap of common friends decreases as kj increases since, although i and j still 
have just one common friend (riij = 1), the overlap of their common friends decreases 
as vertex j acquires new friends {kj increases). This is a reasonable feature of an 
overlap measure in a social context. Finally, as a general remark, since the overlap is 
a property of the link, it has the desirable property that, unlike Cij, it is symmetric 
with respect to its arguments ki and kj. 

The behaviour of average overlap as a function of absolute link weight {0\w^) 
and cumulative link weight {0\Pc{w)) is shown in Fig. 1141 The cumulative link 
weight is defined in the following way. Let P<{x) — J^^p{w) dw, where p{w) is 
the probability density function for the link weights (either or w^). We define 
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Figure 14. Average overlap as a function of absolute link weight {0\w^) (left) 
and cumulative link weig ht {0\Pc{w)) (right) for (o) and w° (•). 



Pc{w) = ■0 G [0, 1] if P< HV" - <w < P< ^(V' + AVj), where P< ^(O is the inverse 
cumulative density function of Hnk weights and A?/; — 1/50. Average overlap (0|w^) 
increases upto « lO'^, after which it dechnes strongly. However, {0\Pc[w)) shows 
that the declining trend is applicable to only some 5% of links, resulting from these 
individuals communicating predominantly just one other person as explored in |23j . 
Note that « 10"* was the crossover point in the distribution of (s^„|s^) and (CIs^), 
indicating that the behavior of these high-strength nodes is different from that of the 
rest. The high strength of these nodes derives from the top 5% of heavy links that 
also behave in an anomalous way as discussed in detail in |23j . 

Could the result concerning overlap Oij vs. link weight Wij be affected by the fact 
that the phone call data is from a single operator and, consequently, calls to phone 
subscriptions managed by other operators are not included? Let us assume that an 
individual in the population has a probability^ = 0.2 of having a subscription governed 
by the operator the data comes from. We assume that the nodes are all identical and 
that the probability of a node being governed by the operator is independent of the 
probability of its neighbor being governed by the operator. Given these assumptions, 
we can interpret p as the probability of a randomly chosen node being governed by 
the operator and, consequently, its being included in our network. Consequently, the 
probability for a link to be included in the network is and that for a triangle is 
p^ . These probabilities give rise to expected number of nodes, links, and triangles 
N = N/p = 5N, L = L/p^ = 25L, and f = T/p^ = 125T, respectively. These 
numbers indicate that the expected number of links and triangles in the underlying 
(unobserved) network, to which we have only partial visibility by virtue of having a 
one-operator sample of it, are 25 times the number of links and 125 times the number of 
triangles in the observed network, respectively. Since the value ofp affects the number 
of observed nodes, links, and triangles in the sample, it is important to consider how 
it may affect overlap Oij. 

To estimate the effect of p on {0\w^), we follow an approach motivated by 
the Bootstrap-technique [35]. We generate a resample of the LCC of our network 
by including each node in the resample with probability p and by varying it obtain 
different sample sizes. In the limit of setting p = 1 we recover the original network. 
The results are shown in Fig. [151 Although lower values of p result in slightly lower 
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Figure 15. Average link overlap as a function of link weight {0\w^ ,p) (left) and 
cumulative link weight {0\Pc{w^),p) (right) for altogether nine network samples 
for the LCC of the mutual network. Three samples were drawn for each value of 
p, corresponding to the probability of a node in the initial network to be included 
in the sample. We used the values of p = 0.8 (top 3 curves), p = 0.6 (middle 3 
curves), and p = 0.4 (bottom 3 curves), and the corresponding sample sizes were 
Arp=0.8 ~ 2.6 X 10'^, ~ 1-4 x 10^, and Np^oA ~ 0.4 x lO^. 



values of {0\w^), its qualitative behavior is fairly insensitive to it. The cumulative 
plot shows how decreasing p does, in fact, cause the curve to become slightly flatter. 
This suggests that if the original network covered a larger fraction of the market 
or, alternatively, if data from several phone operators was aggregated, the value of 
{0\Pc{w^)) would somewhat increase in absolute terms but, most importantly, its 
increasing trend with respect to would become possibly even more pronounced. 
In short, the reported relationship between weight w and overlap O is not an artifact 
caused by having a sample from the underlying mobile phone call network. 

A well-known hypothesis from sociology, the weak ties hypothesis of Granovetter, 
states that the proportional overlap of two individual's friendship networks varies 
directly with the strength of their tie to one another [3S]. According to this hypothesis, 
the strength of a tie is a "combination of the amount of time, the emotional intensity, 
the intimacy (mutual confiding), and the reciprocal services which characterize the tie" 
[36j . The present network is suitable for testing the weak tie hypothesis empirically at 
a societal level for two reasons. First, the weights are phone call durations and thus 
implicate the time commitment to the relationship, one of the variables suggested to 
be indicative of the strength of an interpersonal tie. Second, the size of the network 
guarantees sufficient averaging and, therefore, produces reliable statistics. In addition, 
using the non-mutual network entails at least some degree of reciprocity (at least one 
call has been returned) and, importantly, commitment of phone time in this case also 
implies monetary costs to the caller. The average overlap increases for about 95% 
of link weights, as shown in Fig. [TU and the behavior of the remaining 5% can be 
accounted for (see Supplementary Material in [23j). Importantly, this increasing trend 
is practically unaffected whether number of calls or aggregate call duration 
are used as weights. Put together with the issue of sampling discussed above, these 
results provide a societal level verification of the weak ties hypothesis [23] . 

The results on overlap can be related to the concept of link betweenness centrality, 
defined for a link e = (i, j) as &y = J2veVs J2w£V/{v} (^vw{e) / cTy^, where ayyj{e) is the 
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Figure 16. Ccumulative distribution of link betweenness centrality P>(b) (loft) 
in the LCC of the mutual network and the average link overlap as a function of 
link betweenness centrality {0\b) (right). Here P>(6) has been computed using a 
sample of Ns = 10^ starting nodes from which the shortest paths to every other 
A'^ — 1 nodes were found in order to calculate the betweenness centrality of links. 



number of shortest paths between v and w that contain e, and a^w is the total number 
of shortest paths between v and w [37]. In practice, we use the algorithm introduced in 
[38j to compute bij but, due to the heavy computational requirements of the algorithm, 
instead of using all the nodes of the set V making up the network, we use a subset of 
Ns = 10^ nodes in the sample Vg as starting points. We then use the algorithm to find 
the shortest paths from these Ns nodes to all other remaining iV — 1 nodes, every time 
keeping track of which links are used in constructing the shortest paths. Note that 
using this many source nodes results of the order of 10^^ shortest paths to be computed 
in the network, more than a sufficient number, as was confirmed by using a smaller 
value for Ns. The cumulative distribution of link betweenness centrality is shown in 
Fig. [TBI The figure also shows the behavior of average link overlap as a function of 
link betweenness centrality {0\b). This is in full agreement with the above picture of 
the role of weak and strong links: Weak links have low overlap but high betweenness 
centrality, reflecting their importance in holding the system together, while strong 
links have high overlap but low betweenness centrality and, as such, unlike the weak 
links, are not irreplaceable. 

5. Percolation studies 

We now turn to an examination of the implications of link removal on the global 
properties of networks, which has many precedents in the complex network literature 
[16l[3Zl[3S[lQl[4Tl[42[4l[4l[45]. However, instead of removing links randomly, we 
remove them based on either their weight Wij , overlap Oij , or betweenness centrality 
bij values. Removal can be carried out in one of two directions, i.e., either starting 
from links with low Wy, Oij, or bij values and proceeding towards higher ones or, 
alternatively, starting from links with high Wij, Oij, or bij and proceeding towards 
those with lower corresponding values. This thresholding process is governed by the 
control parameter /, the ratio of removed links, which allows us to interpolate between 
the initial connected network (/ = 0) and a set of isolated nodes (/ — 1). We study 
the response of the network to removal of Wij, Oij, and bij links by monitoring four 
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quantities as a function of the control parameter, which are (1) order parameter J^lqq, 
the fraction of nodes in the LCC, (2) 'susceptibiUty' S — 'Y^^ s'^Ug, where Ug is the 
number of clusters of size s, and (3) average shortest path length (£). In addition, 
we also study the effect of link removal on the (4) average clustering coefficient (C). 
Differences in the behavior of these quantities reflect the global role different links 
have in the network. 

The order parameter -Rlqc is defined as the fraction of nodes in the LCC, i.e., 
the fraction of nodes that can all reach each other through connected paths. We find 
that removing links from low Wij to high Wij (red curve), from low to high Oy 
(red curve) , or from high by to low 6,^ (black curve) leads to a sudden disintegration of 
the network at f^' = 0.8, /"^ = 0.6, and f'' = 0.6, respectively. In contrast, removing 
first the high weight, high overlap, or low betweenness centrality links will shrink 
the network, but will not precipitously break it apart. This suggests that weak and 
strong links, low and high overlap links, and low and high betweenness centrality links 
have all different global structural roles in the network. In particular, it appears that 
removing low overlap links produces a qualitatively similar response to removing high 
betweenness centrality links. 

The second row shows the behavior of S" = J^s s^^is/^, which is analogous to 
magnetic susceptibility in thermal phase transitions, corresponding to the average 
component size in the network with the LCC excluded from the summation. According 
to percolation theory, if the network collapses via a phase transition at /c, then 5' 
diverges as / — > /c for an infinite system. A finite signature of such divergence is clearly 
visible in these plots upon removing low Wij, low , or high bij links, suggesting that 
the network disintegrates at this point following a phase transition. Since the role of 
weak and strong ties is different at the local level and has important consequences from 
the sociological perspective [3B], understanding their different global role is central, 
which is indeed a very pertinent question from the perspective of social network theory 
(see Section [T]). We have studied the global role of weak and strong links using finite 
size scaling (FSS) as reported in [53]. Although different FSS methods yielded slightly 
different results, removal of weak links (red curve) lead to a genuine phase transition 
at around /^(oo) = 0.80, but there appears to be no phase transition when strong 
links are removed first (black curve). This result indicates that weak and strong links 
have qualitatively different global roles in social networks. 

While the size of the largest component tells us about overall connectivity 
of the network, it does not convey information about its topology, only that the 
■^Lcc(/ = 0)i?Lcc(/) nodes are connected through one or more paths. One way to 
characterize the topology of the network is to study the average shortest path length, 
denoted by {£) , which is the average number of links on the shortest path connecting 
any two vertices within the LCC. Note that as links are removed, the network becomes 
fragmented in components, of which we focus only on the largest one, i.e., the LCC 
for the given value of the control parameter /. Path lengths are also important from 
the perspective of network function and efficiency. The existence of a path between 
nodes is a necessary but not sufficient condition for there to be a flow of information 
between them. This is especially true if the transmission through links is leaky, i.e. 
it is possible for information to get lost along the way. Focusing on the role of weak 
and strong ties, we find that removal of weak ties increases path lengths more than 
removal of strong ties does, although the effect is stronger upon removing low 0,j or 
high bij links. 

Path lengths are also related to the conjecture obtained from the weak ties 
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Figure 17. Percolation analysis. Panel (a) shows a small network sample with all 
links intact, (b) the same sample with 80% of the low Wij links removed (/ = 0.80, 
red curve), and (c) the sample with 80% of high Wij links removed (/ = 0.80, 
black curve). Rows (d, e, f, g): Removal of high or low weight Wij links (left 
column), overlap Oij links (middle column), or betweenness centrality bij links 
(right column). The links are removed one at a time based on their ranking, such 
that the black curves correspond to starting removal from high Wij, Oij, and bij 
links, whereas the red curves represent the opposite, starting removal from low 
Wij, Oij, and bij links. The fraction of removed links is denoted by /. Row (d): 
The order parameter Rj^qq, the fraction of nodes of nodes present in the LCC 
of the network for the given value of / to that present in the LCC for / = 0. 



Row (e): S 



s^Ua/N, corresponding to the average component size in the 



network with the LCC excluded from the summation. Row (f): Average shortest 
path length {£) in the LCC of the system for the given value of /, which is also 
expected to diverge as / — > /c. Row (g): Average clustering coefficient (C) in the 
network. 



hypothesis, according to which communities are locaUy connected by single weak ties, 
and removing these weak ties should therefore increase average path lengths making 
it more difficult to reach people [36] . Our result provides an empirical verification 
of the weak ties conjecture. It can also be related to a study dealing with search 
in social networks, according to which successful searches are conducted primarily 
through intermediate to weak strength ties without requiring highly connected hubs 
to succeed [5]. The present results suggest that the success of weak ties for search 
might lie in their function as community connectors, enabling one to reach outside of 
one's own community and thus expanding the set of individuals who may be reached 
through the network. 

The average clustering coefficient (C) measures the local cliquishness of the 
network. Unlike the average shortest path length {£), which is computed only for 
the LCC for the given value of /, the average clustering coefficient is computed over 
all nodes in the network for which degree fc > 1. Removing strong links (Fig. 1171 row 
(g), black curve) leads to a convex clustering curve with an overall lower (C) than when 
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weak links are removed. This happens because the strong hnks are mostly located in 
tightly connected communities where triangles are abundant. Consequently, removing 
them decreases the number of triangles and lowers clustering. Removing weak links 
(red curve) produces a concave clustering curve which first decreases very slowly. 
This is because the weak links are mostly located between communities, acting as 
local bridges and, therefore, rarely participate in triangles. Consequently, removing 
them has little effect on clustering. However, the difference in behavior for overlap 
thresholding is even more drastic. On removing high Oij links, the communities 
become shattered very quickly, so that at / « 0.40 average clustering coefficient is 
close to zero. The opposite happens on removing low Oij links. The average clustering 
increases up-to / ~ 0.54, compatibly with the fact that 53.5% of links in the GC have 
Oij = 0, and reaches a value almost as high as (C) « 0.80. This results demonstrates 
quantitatively that the network is highly clustered and these clusters, or communities, 
can be filtered out reasonably well by removing low Oij links. Again, removal of high 
overlap links is again qualitatively similar to removing low betweenness centrality links 
and vice versa. 

Since some community detection algorithms rely on the concept of betweenness 
centrality to detect communities j46] , our results suggest that it may be possible to use 
the concept of overlap to detect communities at least in social networks. Bearing in 
mind that Oij is a local characteristic and can be computed in 0[N), whereas bij is a 
global characteristic and takes 0{N^ h\N) to compute, algorithms relying on hij could 
use 1 / Oij as a local proxy for bij , potentially leading to significant gains in computing 
performance. We note that a modified version of the edge-clustering coefficient of 
Eq.Elhas also been used to replace edge betweennes centrality in a popular method for 
finding communities [34| . One could alternatively use Oij without any modifications 
and, due to its desirable properties covered in Section [4l it may be better suited for 
that purpose in identifying communities in social networks. 

6. Discussion 

Modern technologies enable the study of social networks of unprecedented size. A 
number of such investigations have appeared recently ranging from exploring email 
communication networks [HI [71 [51 SZ] to identifying groups and strategies in an 
electronic marketplace [3H1 HH [SO]- In this paper we constructed a network from 
mobile phone call records and used both aggregated call durations and the cumulative 
number of calls as a measure of the strength of a social tie. Since the network is 
derived exclusively from one-to-one communication, it can be used as a proxy for 
the underlying human communication network at the societal level which, to our 
knowledge, is the largest weighted social network studied as far. 

In prototypical sociological studies the number of investigated individuals is 
limited to the order of hundred [5T] , although exceptionally, like in the case of the Add 
Health database [S2] as employed, for example, in [S3], tens of thousand of individuals 
may be reached using questionnaires. This method enables covering a broad spectrum 
of interpersonal relations, although the subjectivity and quantification of interaction 
strengths are major problems. In this paper we have followed a complementary 
approach by basing the network on a specific type of social interaction, a phone call, 
allowing an objective measure of interactions for millions of people. We believe that 
studies like this one can provide valuable lessons about the large-scale structure of 
societies emerging from microscopic social interactions. 
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One of our focal points was to explore the relationship between local network 

topology and the associated weights. This is particularly important from the point of 
view of sociology, where such a relation has been hypothesized a long time ago. In 
order to test the weak ties hypothesis, we used the concept of link overlap to measure 
the coupling between link weight and the overlap of the neighborhood in the vicinity 
of the tie. We demonstrated that for 95% of the links the overlap and tie strength 
are correlated, verifying the hypothesis at a societal level. Moreover, we found the 
link overlap to be negative correlated with its betwcenness ccntrality, suggesting that 
the former can be used as a local proxy for the latter, computationally heavy, global 
quantity. 

We explored further the role of weights in the network using the concepts 
of intensity, coherence, and weighted clustering coefficient. We found correlations 
between local network structure at the level of cliques, or communities, and 
interactions strengths within them. The weighted clustering coefficient provides an 
appropriate tool for probing the strength of clustering due to weights, and may be used 
to differentiate between weighted networks that have fundamentally different coupling 
between network topology and interaction strengths. We found that the network is 
assortative in terms of topology as expected but, rather surprisingly, is not weight- 
assortative for a large majority of nodes. Further, the coupling between local network 
structure and interaction strengths carries over to the global level. We quantified this 
by studying the differences in percolation behavior depending on the properties of 
the removed links. Following this approach we also verified the so-called weak ties 
conjecture, a global manifestation of the weak tics hypothesis. 

The obtained results can be used as a basis for devising weighted models of social 
networks. In particular, the relation between topological and statistical properties 
should be incorporated in such models. This enables studying collective social 
phenomena, such as spreading of information and opinion formation, at a level of 
realism and scale not possible in the past. The lessons learnt from this endeavor 
are not limited to understanding human societies, but may find application in other 
domains as well. Finally, we believe that our systematic approach can be adopted to 
study other weighted networks, and the present results can bee seen as a reference 
against which other networks may be compared. 
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